aiq <- airqualityData Visualisation using R
Getting started with Data Visualization using R
“The simple graph has brought more information to the data analyst’s mind than any other device.” — John Tukey
Visualisation is a fundamentally human activity. A good visualisation will show you things that you did not expect, or raise new questions about the data. A good visualisation might also hint that you’re asking the wrong question, or you need to collect different data. Visualisations can surprise you, but don’t scale particularly well because they require a human to interpret them.
Graphics packages in R
There are many graphics packages in R. Some packages are aimed to perform general tasks related with graphs. Some provide specific graphics for certain analyses.
The popular general graphics packages in R are:
graphics : a base R package
ggplot2 : a user-contributed package by Hadley Wickham
lattice : a user-contributed package
Note:
Do you remember what is a package in R?
Where can you learn more about R packages? Google for CRAN Task Views
Except for graphics package (a a base R package), other packages need to downloaded and installed into your R library.
Examples of other more specific packages - to run graphics for certain analyses - are:
survminer::ggsurvlot
sjPlot
1.0 Using Basic Plot Function
Import data into R environment
Daily air quality measurements in New York, May to September 1973.
Format
A data frame with 153 observations on 6 variables.
[,1] |
Ozone |
numeric | Ozone (ppb) |
[,2] |
Solar.R |
numeric | Solar R (lang) |
[,3] |
Wind |
numeric | Wind (mph) |
[,4] |
Temp |
numeric | Temperature (degrees F) |
[,5] |
Month |
numeric | Month (1–12) |
[,6] |
Day |
numeric | Day of month (1–31) |
Details
Daily readings of the following air quality values for May 1, 1973 (a Tuesday) to September 30, 1973.
Ozone: Mean ozone in parts per billion from 1300 to 1500 hours at Roosevelt IslandSolar.R: Solar radiation in Langleys in the frequency band 4000–7700 Angstroms from 0800 to 1200 hours at Central ParkWind: Average wind speed in miles per hour at 0700 and 1000 hours at LaGuardia AirportTemp: Maximum daily temperature in degrees Fahrenheit at LaGuardia Airport.
Barplot
There are two types of bar plots- horizontal and vertical which represent data points as horizontal or vertical bars of certain lengths proportional to the value of the data item. They are generally used for continuous and categorical variable plotting. By setting the horiz parameter to true and false, we can get horizontal and vertical bar plots respectively.
# Horizontal Bar Plot for
# Ozone concentration in air
barplot(aiq$Ozone,
main = 'Ozone Concenteration in air',
xlab = 'Ozone Levels', horiz = TRUE)# Vertical Bar Plot for
# Ozone concentration in air
barplot(aiq$Ozone, main = 'Ozone Concenteration in air',
xlab = 'Ozone Levels', col ='blue', horiz = FALSE)Bar plots are used for the following scenarios:
To perform a comparative study between the various data categories in the data set.
To analyze the change of a variable over time in months or years.
Histogram
A histogram is like a bar chart as it uses bars of varying height to represent data distribution. However, in a histogram values are grouped into consecutive intervals called bins. In a Histogram, continuous values are grouped and displayed in these bins whose size can be varied.
# Histogram for Maximum Daily Temperature
hist(aiq$Temp, main ="Maximum Temperature(Daily)",
xlab ="Temperature(Fahrenheit)",
xlim = c(50, 125), col ="yellow",
freq = TRUE)For a histogram, the parameter xlim can be used to specify the interval within which all values are to be displayed.
Another parameter freq when set to TRUE denotes the frequency of the various values in the histogram and when set to FALSE, the probability densities are represented on the y-axis such that they are of the histogram adds up to one.
Box Plot
The statistical summary of the given data is presented graphically using a boxplot. A boxplot depicts information like the minimum and maximum data point, the median value, first and third quartile, and interquartile range.
# Box plot for average wind speed
boxplot(aiq$Wind, main = "Average wind speed",
xlab = "Miles per hour", ylab = "Wind",
col = "orange", border = "black",
horizontal = TRUE)Multiple box plots can also be generated at once through the following code:
# Multiple Box plots, each representing
# an Air Quality Parameter
boxplot(aiq[, 1:4],
main ='Box Plots for Air Quality Parameters')Scatter Plot
A scatter plot is composed of many points on a Cartesian plane. Each point denotes the value taken by two parameters and helps us easily identify the relationship between them.
# Scatter plot for Ozone Concentration per month
plot(aiq$Ozone, aiq$Temp,
main ="Scatterplot Example",
xlab ="Ozone Concentration in parts per billion",
ylab =" Temperature ", pch = 19)plot(aiq) # plot matrix2.0 Using ggplot2 function
ggplot2 is a R package dedicated to data visualization. It can greatly improve the quality and aesthetics of your graphics, and will make you much more efficient in creating them.
To work with ggplot2, remember
start with:
ggplot()which data:
data = Xwhich variables:
aes(x = , y = )which graph:
geom_histogram(),geom_points()
The official website for ggplot2 is here http://ggplot2.org/.
Load Package
library(tidyverse)── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(gcookbook)That one line of code loads the core tidyverse; packages which you will use in almost every data analysis.
Data Frame
heightweight sex ageYear ageMonth heightIn weightLb
1 f 11.92 143 56.3 85.0
2 f 12.92 155 62.3 105.0
3 f 12.75 153 63.3 108.0
4 f 13.42 161 59.0 92.0
5 f 15.92 191 62.5 112.5
6 f 14.25 171 62.5 112.0
7 f 15.42 185 59.0 104.0
8 f 11.83 142 56.5 69.0
9 f 13.33 160 62.0 94.5
10 f 11.67 140 53.8 68.5
11 f 11.58 139 61.5 104.0
12 f 14.83 178 61.5 103.5
13 f 13.08 157 64.5 123.5
14 f 12.42 149 58.3 93.0
15 f 11.92 143 51.3 50.5
16 f 12.08 145 58.8 89.0
17 f 15.92 191 65.3 107.0
18 f 12.50 150 59.5 78.5
19 f 12.25 147 61.3 115.0
20 f 15.00 180 63.3 114.0
21 f 11.75 141 61.8 85.0
22 f 11.67 140 53.5 81.0
23 f 13.67 164 58.0 83.5
24 f 14.67 176 61.3 112.0
25 f 15.42 185 63.3 101.0
26 f 13.83 166 61.5 103.5
27 f 14.58 175 60.8 93.5
28 f 15.00 180 59.0 112.0
29 f 17.50 210 65.5 140.0
30 f 12.17 146 56.3 83.5
31 f 14.17 170 64.3 90.0
32 f 13.50 162 58.0 84.0
33 f 12.42 149 64.3 110.5
34 f 11.58 139 57.5 96.0
35 f 15.50 186 57.8 95.0
36 f 16.42 197 61.5 121.0
37 f 14.08 169 62.3 99.5
38 f 14.75 177 61.8 142.5
39 f 15.42 185 65.3 118.0
40 f 15.17 182 58.3 104.5
41 f 14.42 173 62.8 102.5
42 f 13.83 166 59.3 89.5
43 f 14.00 168 61.5 95.0
44 f 14.08 169 62.0 98.5
45 f 12.50 150 61.3 94.0
46 f 15.33 184 62.3 108.0
47 f 11.58 139 52.8 63.5
48 f 12.25 147 59.8 84.5
49 f 12.00 144 59.5 93.5
50 f 14.75 177 61.3 112.0
51 f 14.83 178 63.5 148.5
52 f 16.42 197 64.8 112.0
53 f 12.17 146 60.0 109.0
54 f 12.08 145 59.0 91.5
55 f 12.25 147 55.8 75.0
56 f 12.08 145 57.8 84.0
57 f 12.92 155 61.3 107.0
58 f 13.92 167 62.3 92.5
59 f 15.25 183 64.3 109.5
60 f 11.92 143 55.5 84.0
61 f 15.25 183 64.5 102.5
62 f 15.42 185 60.0 106.0
63 f 12.33 148 56.3 77.0
64 f 12.25 147 58.3 111.5
65 f 12.83 154 60.0 114.0
66 f 13.00 156 54.5 75.0
67 f 12.00 144 55.8 73.5
68 f 12.83 154 62.8 93.5
69 f 12.67 152 60.5 105.0
70 f 15.92 191 63.3 113.5
71 f 15.83 190 66.8 140.0
72 f 11.67 140 60.0 77.0
73 f 12.33 148 60.5 84.5
74 f 15.75 189 64.3 113.5
75 f 11.92 143 58.3 77.5
76 f 14.83 178 66.5 117.5
77 f 13.67 164 65.3 98.0
78 f 13.08 157 60.5 112.0
79 f 12.25 147 59.5 101.0
80 f 12.33 148 59.0 95.0
81 f 14.75 177 61.3 81.0
82 f 14.25 171 61.5 91.0
83 f 14.33 172 64.8 142.0
84 f 15.83 190 56.8 98.5
85 f 15.25 183 66.5 112.0
86 f 11.92 143 61.5 116.5
87 f 14.92 179 63.0 98.5
88 f 15.50 186 57.0 83.5
89 f 15.17 182 65.5 133.0
90 f 15.17 182 62.0 91.5
91 f 11.83 142 56.0 72.5
92 f 13.75 165 61.3 106.5
93 f 13.75 165 55.5 67.0
94 f 12.83 154 61.0 122.5
95 f 12.50 150 54.5 74.0
96 f 12.92 155 66.0 144.5
97 f 13.58 163 56.5 84.0
98 f 11.75 141 56.0 72.5
99 f 12.25 147 51.5 64.0
100 f 17.50 210 62.0 116.0
101 f 14.25 171 63.0 84.0
102 f 13.92 167 61.0 93.5
103 f 15.17 182 64.0 111.5
104 f 12.00 144 61.0 92.0
105 f 16.08 193 59.8 115.0
106 f 11.75 141 61.3 85.0
107 f 13.67 164 63.3 108.0
108 f 15.50 186 63.5 108.0
109 f 14.08 169 61.5 85.0
110 f 14.58 175 60.3 86.0
111 f 15.00 180 61.3 110.5
112 m 13.75 165 64.8 98.0
113 m 13.08 157 60.5 105.0
114 m 12.00 144 57.3 76.5
115 m 12.50 150 59.5 84.0
116 m 12.50 150 60.8 128.0
117 m 11.58 139 60.5 87.0
118 m 15.75 189 67.0 128.0
119 m 15.25 183 64.8 111.0
120 m 12.25 147 50.5 79.0
121 m 12.17 146 57.5 90.0
122 m 13.33 160 60.5 84.0
123 m 13.00 156 61.8 112.0
124 m 14.42 173 61.3 93.0
125 m 12.58 151 66.3 117.0
126 m 11.75 141 53.3 84.0
127 m 12.50 150 59.0 99.5
128 m 13.67 164 57.8 95.0
129 m 12.75 153 60.0 84.0
130 m 17.17 206 68.3 134.0
132 m 14.67 176 63.8 98.5
133 m 14.67 176 65.0 118.5
134 m 11.67 140 59.5 94.5
135 m 15.42 185 66.0 105.0
136 m 15.00 180 61.8 104.0
137 m 12.17 146 57.3 83.0
138 m 15.25 183 66.0 105.5
139 m 11.67 140 56.5 84.0
140 m 12.58 151 58.3 86.0
141 m 12.58 151 61.0 81.0
142 m 12.00 144 62.8 94.0
143 m 13.33 160 59.3 78.5
144 m 14.83 178 67.3 119.5
145 m 16.08 193 66.3 133.0
146 m 13.50 162 64.5 119.0
147 m 13.67 164 60.5 95.0
148 m 15.50 186 66.0 112.0
149 m 11.92 143 57.5 75.0
150 m 14.58 175 64.0 92.0
151 m 14.58 175 68.0 112.0
152 m 14.58 175 63.5 98.5
153 m 14.42 173 69.0 112.5
154 m 14.17 170 63.8 112.5
155 m 14.50 174 66.0 108.0
156 m 13.67 164 63.5 108.0
157 m 12.00 144 59.5 88.0
158 m 13.00 156 66.3 106.0
159 m 12.42 149 57.0 92.0
160 m 12.00 144 60.0 117.5
161 m 12.25 147 57.0 84.0
162 m 15.67 188 67.3 112.0
163 m 14.08 169 62.0 100.0
164 m 14.33 172 65.0 112.0
165 m 12.50 150 59.5 84.0
166 m 16.08 193 67.8 127.5
167 m 13.08 157 58.0 80.5
168 m 14.00 168 60.0 93.5
169 m 11.67 140 58.5 86.5
170 m 13.00 156 58.3 92.5
171 m 13.00 156 61.5 108.5
172 m 13.17 158 65.0 121.0
173 m 15.33 184 66.5 112.0
174 m 13.00 156 68.5 114.0
175 m 12.00 144 57.0 84.0
176 m 14.67 176 61.5 81.0
177 m 14.00 168 66.5 111.5
178 m 12.42 149 52.5 81.0
179 m 11.83 142 55.0 70.0
180 m 15.67 188 71.0 140.0
181 m 16.92 203 66.5 117.0
182 m 11.83 142 58.8 84.0
183 m 15.75 189 66.3 112.0
184 m 15.67 188 65.8 150.5
185 m 16.67 200 71.0 147.0
186 m 12.67 152 59.5 105.0
187 m 14.50 174 69.8 119.5
188 m 13.83 166 62.5 84.0
189 m 12.08 145 56.5 91.0
190 m 11.92 143 57.5 101.0
191 m 13.58 163 65.3 117.5
192 m 13.83 166 67.3 121.0
193 m 15.17 182 67.0 133.0
194 m 14.42 173 66.0 112.0
195 m 12.92 155 61.8 91.5
196 m 13.50 162 60.0 105.0
197 m 14.75 177 63.0 111.0
198 m 14.75 177 60.5 112.0
199 m 14.58 175 65.5 114.0
200 m 13.83 166 62.0 91.0
201 m 12.50 150 59.0 98.0
202 m 12.50 150 61.8 118.0
203 m 15.67 188 63.3 115.5
204 m 13.58 163 66.0 112.0
205 m 14.25 171 61.8 112.0
206 m 13.50 162 63.0 91.0
207 m 11.75 141 57.5 85.0
208 m 14.50 174 63.0 112.0
209 m 11.83 142 56.0 87.5
210 m 12.33 148 60.5 118.0
211 m 11.67 140 56.8 83.5
212 m 13.33 160 64.0 116.0
213 m 12.00 144 60.0 89.0
214 m 17.17 206 69.5 171.5
215 m 13.25 159 63.3 112.0
216 m 12.42 149 56.3 72.0
217 m 16.08 193 72.0 150.0
218 m 16.17 194 65.3 134.5
219 m 12.67 152 60.8 97.0
220 m 12.17 146 55.0 71.5
221 m 11.58 139 55.0 73.5
222 m 15.50 186 66.5 112.0
223 m 13.42 161 56.8 75.0
224 m 12.75 153 64.8 128.0
225 m 16.33 196 64.5 98.0
226 m 13.67 164 58.0 84.0
227 m 13.25 159 62.8 99.0
228 m 14.83 178 63.8 112.0
229 m 12.75 153 57.8 79.5
230 m 12.92 155 57.3 80.5
231 m 14.83 178 63.5 102.5
232 m 11.83 142 55.0 76.0
233 m 13.67 164 66.5 112.0
234 m 15.75 189 65.0 114.0
235 m 13.67 164 61.5 140.0
236 m 13.92 167 62.0 107.5
237 m 12.58 151 59.3 87.0
2.1 Scatter Plot
Scatter plots are used to display the relationship between two continuous variables. In a scatter plot, each observation in a data set is represented by a point.
Basic Scatter Plot
heightweight %>%
select(ageYear, heightIn) ageYear heightIn
1 11.92 56.3
2 12.92 62.3
3 12.75 63.3
4 13.42 59.0
5 15.92 62.5
6 14.25 62.5
7 15.42 59.0
8 11.83 56.5
9 13.33 62.0
10 11.67 53.8
11 11.58 61.5
12 14.83 61.5
13 13.08 64.5
14 12.42 58.3
15 11.92 51.3
16 12.08 58.8
17 15.92 65.3
18 12.50 59.5
19 12.25 61.3
20 15.00 63.3
21 11.75 61.8
22 11.67 53.5
23 13.67 58.0
24 14.67 61.3
25 15.42 63.3
26 13.83 61.5
27 14.58 60.8
28 15.00 59.0
29 17.50 65.5
30 12.17 56.3
31 14.17 64.3
32 13.50 58.0
33 12.42 64.3
34 11.58 57.5
35 15.50 57.8
36 16.42 61.5
37 14.08 62.3
38 14.75 61.8
39 15.42 65.3
40 15.17 58.3
41 14.42 62.8
42 13.83 59.3
43 14.00 61.5
44 14.08 62.0
45 12.50 61.3
46 15.33 62.3
47 11.58 52.8
48 12.25 59.8
49 12.00 59.5
50 14.75 61.3
51 14.83 63.5
52 16.42 64.8
53 12.17 60.0
54 12.08 59.0
55 12.25 55.8
56 12.08 57.8
57 12.92 61.3
58 13.92 62.3
59 15.25 64.3
60 11.92 55.5
61 15.25 64.5
62 15.42 60.0
63 12.33 56.3
64 12.25 58.3
65 12.83 60.0
66 13.00 54.5
67 12.00 55.8
68 12.83 62.8
69 12.67 60.5
70 15.92 63.3
71 15.83 66.8
72 11.67 60.0
73 12.33 60.5
74 15.75 64.3
75 11.92 58.3
76 14.83 66.5
77 13.67 65.3
78 13.08 60.5
79 12.25 59.5
80 12.33 59.0
81 14.75 61.3
82 14.25 61.5
83 14.33 64.8
84 15.83 56.8
85 15.25 66.5
86 11.92 61.5
87 14.92 63.0
88 15.50 57.0
89 15.17 65.5
90 15.17 62.0
91 11.83 56.0
92 13.75 61.3
93 13.75 55.5
94 12.83 61.0
95 12.50 54.5
96 12.92 66.0
97 13.58 56.5
98 11.75 56.0
99 12.25 51.5
100 17.50 62.0
101 14.25 63.0
102 13.92 61.0
103 15.17 64.0
104 12.00 61.0
105 16.08 59.8
106 11.75 61.3
107 13.67 63.3
108 15.50 63.5
109 14.08 61.5
110 14.58 60.3
111 15.00 61.3
112 13.75 64.8
113 13.08 60.5
114 12.00 57.3
115 12.50 59.5
116 12.50 60.8
117 11.58 60.5
118 15.75 67.0
119 15.25 64.8
120 12.25 50.5
121 12.17 57.5
122 13.33 60.5
123 13.00 61.8
124 14.42 61.3
125 12.58 66.3
126 11.75 53.3
127 12.50 59.0
128 13.67 57.8
129 12.75 60.0
130 17.17 68.3
132 14.67 63.8
133 14.67 65.0
134 11.67 59.5
135 15.42 66.0
136 15.00 61.8
137 12.17 57.3
138 15.25 66.0
139 11.67 56.5
140 12.58 58.3
141 12.58 61.0
142 12.00 62.8
143 13.33 59.3
144 14.83 67.3
145 16.08 66.3
146 13.50 64.5
147 13.67 60.5
148 15.50 66.0
149 11.92 57.5
150 14.58 64.0
151 14.58 68.0
152 14.58 63.5
153 14.42 69.0
154 14.17 63.8
155 14.50 66.0
156 13.67 63.5
157 12.00 59.5
158 13.00 66.3
159 12.42 57.0
160 12.00 60.0
161 12.25 57.0
162 15.67 67.3
163 14.08 62.0
164 14.33 65.0
165 12.50 59.5
166 16.08 67.8
167 13.08 58.0
168 14.00 60.0
169 11.67 58.5
170 13.00 58.3
171 13.00 61.5
172 13.17 65.0
173 15.33 66.5
174 13.00 68.5
175 12.00 57.0
176 14.67 61.5
177 14.00 66.5
178 12.42 52.5
179 11.83 55.0
180 15.67 71.0
181 16.92 66.5
182 11.83 58.8
183 15.75 66.3
184 15.67 65.8
185 16.67 71.0
186 12.67 59.5
187 14.50 69.8
188 13.83 62.5
189 12.08 56.5
190 11.92 57.5
191 13.58 65.3
192 13.83 67.3
193 15.17 67.0
194 14.42 66.0
195 12.92 61.8
196 13.50 60.0
197 14.75 63.0
198 14.75 60.5
199 14.58 65.5
200 13.83 62.0
201 12.50 59.0
202 12.50 61.8
203 15.67 63.3
204 13.58 66.0
205 14.25 61.8
206 13.50 63.0
207 11.75 57.5
208 14.50 63.0
209 11.83 56.0
210 12.33 60.5
211 11.67 56.8
212 13.33 64.0
213 12.00 60.0
214 17.17 69.5
215 13.25 63.3
216 12.42 56.3
217 16.08 72.0
218 16.17 65.3
219 12.67 60.8
220 12.17 55.0
221 11.58 55.0
222 15.50 66.5
223 13.42 56.8
224 12.75 64.8
225 16.33 64.5
226 13.67 58.0
227 13.25 62.8
228 14.83 63.8
229 12.75 57.8
230 12.92 57.3
231 14.83 63.5
232 11.83 55.0
233 13.67 66.5
234 15.75 65.0
235 13.67 61.5
236 13.92 62.0
237 12.58 59.3
ggplot(heightweight, aes(x = ageYear, y = heightIn)) +
geom_point()Use Different shape of the points
ggplot(heightweight, aes(x = ageYear, y = heightIn)) +
geom_point(shape = 21)Grouping Points Together using Shapes or Colors
You want to visually group points by some variable (the grouping variable), using different shapes or colors.
heightweight %>%
select(sex, ageYear, heightIn) sex ageYear heightIn
1 f 11.92 56.3
2 f 12.92 62.3
3 f 12.75 63.3
4 f 13.42 59.0
5 f 15.92 62.5
6 f 14.25 62.5
7 f 15.42 59.0
8 f 11.83 56.5
9 f 13.33 62.0
10 f 11.67 53.8
11 f 11.58 61.5
12 f 14.83 61.5
13 f 13.08 64.5
14 f 12.42 58.3
15 f 11.92 51.3
16 f 12.08 58.8
17 f 15.92 65.3
18 f 12.50 59.5
19 f 12.25 61.3
20 f 15.00 63.3
21 f 11.75 61.8
22 f 11.67 53.5
23 f 13.67 58.0
24 f 14.67 61.3
25 f 15.42 63.3
26 f 13.83 61.5
27 f 14.58 60.8
28 f 15.00 59.0
29 f 17.50 65.5
30 f 12.17 56.3
31 f 14.17 64.3
32 f 13.50 58.0
33 f 12.42 64.3
34 f 11.58 57.5
35 f 15.50 57.8
36 f 16.42 61.5
37 f 14.08 62.3
38 f 14.75 61.8
39 f 15.42 65.3
40 f 15.17 58.3
41 f 14.42 62.8
42 f 13.83 59.3
43 f 14.00 61.5
44 f 14.08 62.0
45 f 12.50 61.3
46 f 15.33 62.3
47 f 11.58 52.8
48 f 12.25 59.8
49 f 12.00 59.5
50 f 14.75 61.3
51 f 14.83 63.5
52 f 16.42 64.8
53 f 12.17 60.0
54 f 12.08 59.0
55 f 12.25 55.8
56 f 12.08 57.8
57 f 12.92 61.3
58 f 13.92 62.3
59 f 15.25 64.3
60 f 11.92 55.5
61 f 15.25 64.5
62 f 15.42 60.0
63 f 12.33 56.3
64 f 12.25 58.3
65 f 12.83 60.0
66 f 13.00 54.5
67 f 12.00 55.8
68 f 12.83 62.8
69 f 12.67 60.5
70 f 15.92 63.3
71 f 15.83 66.8
72 f 11.67 60.0
73 f 12.33 60.5
74 f 15.75 64.3
75 f 11.92 58.3
76 f 14.83 66.5
77 f 13.67 65.3
78 f 13.08 60.5
79 f 12.25 59.5
80 f 12.33 59.0
81 f 14.75 61.3
82 f 14.25 61.5
83 f 14.33 64.8
84 f 15.83 56.8
85 f 15.25 66.5
86 f 11.92 61.5
87 f 14.92 63.0
88 f 15.50 57.0
89 f 15.17 65.5
90 f 15.17 62.0
91 f 11.83 56.0
92 f 13.75 61.3
93 f 13.75 55.5
94 f 12.83 61.0
95 f 12.50 54.5
96 f 12.92 66.0
97 f 13.58 56.5
98 f 11.75 56.0
99 f 12.25 51.5
100 f 17.50 62.0
101 f 14.25 63.0
102 f 13.92 61.0
103 f 15.17 64.0
104 f 12.00 61.0
105 f 16.08 59.8
106 f 11.75 61.3
107 f 13.67 63.3
108 f 15.50 63.5
109 f 14.08 61.5
110 f 14.58 60.3
111 f 15.00 61.3
112 m 13.75 64.8
113 m 13.08 60.5
114 m 12.00 57.3
115 m 12.50 59.5
116 m 12.50 60.8
117 m 11.58 60.5
118 m 15.75 67.0
119 m 15.25 64.8
120 m 12.25 50.5
121 m 12.17 57.5
122 m 13.33 60.5
123 m 13.00 61.8
124 m 14.42 61.3
125 m 12.58 66.3
126 m 11.75 53.3
127 m 12.50 59.0
128 m 13.67 57.8
129 m 12.75 60.0
130 m 17.17 68.3
132 m 14.67 63.8
133 m 14.67 65.0
134 m 11.67 59.5
135 m 15.42 66.0
136 m 15.00 61.8
137 m 12.17 57.3
138 m 15.25 66.0
139 m 11.67 56.5
140 m 12.58 58.3
141 m 12.58 61.0
142 m 12.00 62.8
143 m 13.33 59.3
144 m 14.83 67.3
145 m 16.08 66.3
146 m 13.50 64.5
147 m 13.67 60.5
148 m 15.50 66.0
149 m 11.92 57.5
150 m 14.58 64.0
151 m 14.58 68.0
152 m 14.58 63.5
153 m 14.42 69.0
154 m 14.17 63.8
155 m 14.50 66.0
156 m 13.67 63.5
157 m 12.00 59.5
158 m 13.00 66.3
159 m 12.42 57.0
160 m 12.00 60.0
161 m 12.25 57.0
162 m 15.67 67.3
163 m 14.08 62.0
164 m 14.33 65.0
165 m 12.50 59.5
166 m 16.08 67.8
167 m 13.08 58.0
168 m 14.00 60.0
169 m 11.67 58.5
170 m 13.00 58.3
171 m 13.00 61.5
172 m 13.17 65.0
173 m 15.33 66.5
174 m 13.00 68.5
175 m 12.00 57.0
176 m 14.67 61.5
177 m 14.00 66.5
178 m 12.42 52.5
179 m 11.83 55.0
180 m 15.67 71.0
181 m 16.92 66.5
182 m 11.83 58.8
183 m 15.75 66.3
184 m 15.67 65.8
185 m 16.67 71.0
186 m 12.67 59.5
187 m 14.50 69.8
188 m 13.83 62.5
189 m 12.08 56.5
190 m 11.92 57.5
191 m 13.58 65.3
192 m 13.83 67.3
193 m 15.17 67.0
194 m 14.42 66.0
195 m 12.92 61.8
196 m 13.50 60.0
197 m 14.75 63.0
198 m 14.75 60.5
199 m 14.58 65.5
200 m 13.83 62.0
201 m 12.50 59.0
202 m 12.50 61.8
203 m 15.67 63.3
204 m 13.58 66.0
205 m 14.25 61.8
206 m 13.50 63.0
207 m 11.75 57.5
208 m 14.50 63.0
209 m 11.83 56.0
210 m 12.33 60.5
211 m 11.67 56.8
212 m 13.33 64.0
213 m 12.00 60.0
214 m 17.17 69.5
215 m 13.25 63.3
216 m 12.42 56.3
217 m 16.08 72.0
218 m 16.17 65.3
219 m 12.67 60.8
220 m 12.17 55.0
221 m 11.58 55.0
222 m 15.50 66.5
223 m 13.42 56.8
224 m 12.75 64.8
225 m 16.33 64.5
226 m 13.67 58.0
227 m 13.25 62.8
228 m 14.83 63.8
229 m 12.75 57.8
230 m 12.92 57.3
231 m 14.83 63.5
232 m 11.83 55.0
233 m 13.67 66.5
234 m 15.75 65.0
235 m 13.67 61.5
236 m 13.92 62.0
237 m 12.58 59.3
ggplot(heightweight, aes(x = ageYear, y = heightIn, shape = sex,colour = sex)) +
geom_point()Apply facet_wrap for viewing the data according to categorical variable (Split your plot)
ggplot(heightweight, aes(x = ageYear, y = heightIn,colour = sex)) +
geom_point() +
facet_wrap(~sex)Mapping a Continuous Variable to Color or Size
A basic scatter plot shows the relationship between two continuous variables: one mapped to the x-axis, and one to the y-axis. When there are more than two continuous variables, these additional variables must be mapped to other aesthetics, like size and color.
heightweight %>%
select(sex, ageYear, heightIn, weightLb) sex ageYear heightIn weightLb
1 f 11.92 56.3 85.0
2 f 12.92 62.3 105.0
3 f 12.75 63.3 108.0
4 f 13.42 59.0 92.0
5 f 15.92 62.5 112.5
6 f 14.25 62.5 112.0
7 f 15.42 59.0 104.0
8 f 11.83 56.5 69.0
9 f 13.33 62.0 94.5
10 f 11.67 53.8 68.5
11 f 11.58 61.5 104.0
12 f 14.83 61.5 103.5
13 f 13.08 64.5 123.5
14 f 12.42 58.3 93.0
15 f 11.92 51.3 50.5
16 f 12.08 58.8 89.0
17 f 15.92 65.3 107.0
18 f 12.50 59.5 78.5
19 f 12.25 61.3 115.0
20 f 15.00 63.3 114.0
21 f 11.75 61.8 85.0
22 f 11.67 53.5 81.0
23 f 13.67 58.0 83.5
24 f 14.67 61.3 112.0
25 f 15.42 63.3 101.0
26 f 13.83 61.5 103.5
27 f 14.58 60.8 93.5
28 f 15.00 59.0 112.0
29 f 17.50 65.5 140.0
30 f 12.17 56.3 83.5
31 f 14.17 64.3 90.0
32 f 13.50 58.0 84.0
33 f 12.42 64.3 110.5
34 f 11.58 57.5 96.0
35 f 15.50 57.8 95.0
36 f 16.42 61.5 121.0
37 f 14.08 62.3 99.5
38 f 14.75 61.8 142.5
39 f 15.42 65.3 118.0
40 f 15.17 58.3 104.5
41 f 14.42 62.8 102.5
42 f 13.83 59.3 89.5
43 f 14.00 61.5 95.0
44 f 14.08 62.0 98.5
45 f 12.50 61.3 94.0
46 f 15.33 62.3 108.0
47 f 11.58 52.8 63.5
48 f 12.25 59.8 84.5
49 f 12.00 59.5 93.5
50 f 14.75 61.3 112.0
51 f 14.83 63.5 148.5
52 f 16.42 64.8 112.0
53 f 12.17 60.0 109.0
54 f 12.08 59.0 91.5
55 f 12.25 55.8 75.0
56 f 12.08 57.8 84.0
57 f 12.92 61.3 107.0
58 f 13.92 62.3 92.5
59 f 15.25 64.3 109.5
60 f 11.92 55.5 84.0
61 f 15.25 64.5 102.5
62 f 15.42 60.0 106.0
63 f 12.33 56.3 77.0
64 f 12.25 58.3 111.5
65 f 12.83 60.0 114.0
66 f 13.00 54.5 75.0
67 f 12.00 55.8 73.5
68 f 12.83 62.8 93.5
69 f 12.67 60.5 105.0
70 f 15.92 63.3 113.5
71 f 15.83 66.8 140.0
72 f 11.67 60.0 77.0
73 f 12.33 60.5 84.5
74 f 15.75 64.3 113.5
75 f 11.92 58.3 77.5
76 f 14.83 66.5 117.5
77 f 13.67 65.3 98.0
78 f 13.08 60.5 112.0
79 f 12.25 59.5 101.0
80 f 12.33 59.0 95.0
81 f 14.75 61.3 81.0
82 f 14.25 61.5 91.0
83 f 14.33 64.8 142.0
84 f 15.83 56.8 98.5
85 f 15.25 66.5 112.0
86 f 11.92 61.5 116.5
87 f 14.92 63.0 98.5
88 f 15.50 57.0 83.5
89 f 15.17 65.5 133.0
90 f 15.17 62.0 91.5
91 f 11.83 56.0 72.5
92 f 13.75 61.3 106.5
93 f 13.75 55.5 67.0
94 f 12.83 61.0 122.5
95 f 12.50 54.5 74.0
96 f 12.92 66.0 144.5
97 f 13.58 56.5 84.0
98 f 11.75 56.0 72.5
99 f 12.25 51.5 64.0
100 f 17.50 62.0 116.0
101 f 14.25 63.0 84.0
102 f 13.92 61.0 93.5
103 f 15.17 64.0 111.5
104 f 12.00 61.0 92.0
105 f 16.08 59.8 115.0
106 f 11.75 61.3 85.0
107 f 13.67 63.3 108.0
108 f 15.50 63.5 108.0
109 f 14.08 61.5 85.0
110 f 14.58 60.3 86.0
111 f 15.00 61.3 110.5
112 m 13.75 64.8 98.0
113 m 13.08 60.5 105.0
114 m 12.00 57.3 76.5
115 m 12.50 59.5 84.0
116 m 12.50 60.8 128.0
117 m 11.58 60.5 87.0
118 m 15.75 67.0 128.0
119 m 15.25 64.8 111.0
120 m 12.25 50.5 79.0
121 m 12.17 57.5 90.0
122 m 13.33 60.5 84.0
123 m 13.00 61.8 112.0
124 m 14.42 61.3 93.0
125 m 12.58 66.3 117.0
126 m 11.75 53.3 84.0
127 m 12.50 59.0 99.5
128 m 13.67 57.8 95.0
129 m 12.75 60.0 84.0
130 m 17.17 68.3 134.0
132 m 14.67 63.8 98.5
133 m 14.67 65.0 118.5
134 m 11.67 59.5 94.5
135 m 15.42 66.0 105.0
136 m 15.00 61.8 104.0
137 m 12.17 57.3 83.0
138 m 15.25 66.0 105.5
139 m 11.67 56.5 84.0
140 m 12.58 58.3 86.0
141 m 12.58 61.0 81.0
142 m 12.00 62.8 94.0
143 m 13.33 59.3 78.5
144 m 14.83 67.3 119.5
145 m 16.08 66.3 133.0
146 m 13.50 64.5 119.0
147 m 13.67 60.5 95.0
148 m 15.50 66.0 112.0
149 m 11.92 57.5 75.0
150 m 14.58 64.0 92.0
151 m 14.58 68.0 112.0
152 m 14.58 63.5 98.5
153 m 14.42 69.0 112.5
154 m 14.17 63.8 112.5
155 m 14.50 66.0 108.0
156 m 13.67 63.5 108.0
157 m 12.00 59.5 88.0
158 m 13.00 66.3 106.0
159 m 12.42 57.0 92.0
160 m 12.00 60.0 117.5
161 m 12.25 57.0 84.0
162 m 15.67 67.3 112.0
163 m 14.08 62.0 100.0
164 m 14.33 65.0 112.0
165 m 12.50 59.5 84.0
166 m 16.08 67.8 127.5
167 m 13.08 58.0 80.5
168 m 14.00 60.0 93.5
169 m 11.67 58.5 86.5
170 m 13.00 58.3 92.5
171 m 13.00 61.5 108.5
172 m 13.17 65.0 121.0
173 m 15.33 66.5 112.0
174 m 13.00 68.5 114.0
175 m 12.00 57.0 84.0
176 m 14.67 61.5 81.0
177 m 14.00 66.5 111.5
178 m 12.42 52.5 81.0
179 m 11.83 55.0 70.0
180 m 15.67 71.0 140.0
181 m 16.92 66.5 117.0
182 m 11.83 58.8 84.0
183 m 15.75 66.3 112.0
184 m 15.67 65.8 150.5
185 m 16.67 71.0 147.0
186 m 12.67 59.5 105.0
187 m 14.50 69.8 119.5
188 m 13.83 62.5 84.0
189 m 12.08 56.5 91.0
190 m 11.92 57.5 101.0
191 m 13.58 65.3 117.5
192 m 13.83 67.3 121.0
193 m 15.17 67.0 133.0
194 m 14.42 66.0 112.0
195 m 12.92 61.8 91.5
196 m 13.50 60.0 105.0
197 m 14.75 63.0 111.0
198 m 14.75 60.5 112.0
199 m 14.58 65.5 114.0
200 m 13.83 62.0 91.0
201 m 12.50 59.0 98.0
202 m 12.50 61.8 118.0
203 m 15.67 63.3 115.5
204 m 13.58 66.0 112.0
205 m 14.25 61.8 112.0
206 m 13.50 63.0 91.0
207 m 11.75 57.5 85.0
208 m 14.50 63.0 112.0
209 m 11.83 56.0 87.5
210 m 12.33 60.5 118.0
211 m 11.67 56.8 83.5
212 m 13.33 64.0 116.0
213 m 12.00 60.0 89.0
214 m 17.17 69.5 171.5
215 m 13.25 63.3 112.0
216 m 12.42 56.3 72.0
217 m 16.08 72.0 150.0
218 m 16.17 65.3 134.5
219 m 12.67 60.8 97.0
220 m 12.17 55.0 71.5
221 m 11.58 55.0 73.5
222 m 15.50 66.5 112.0
223 m 13.42 56.8 75.0
224 m 12.75 64.8 128.0
225 m 16.33 64.5 98.0
226 m 13.67 58.0 84.0
227 m 13.25 62.8 99.0
228 m 14.83 63.8 112.0
229 m 12.75 57.8 79.5
230 m 12.92 57.3 80.5
231 m 14.83 63.5 102.5
232 m 11.83 55.0 76.0
233 m 13.67 66.5 112.0
234 m 15.75 65.0 114.0
235 m 13.67 61.5 140.0
236 m 13.92 62.0 107.5
237 m 12.58 59.3 87.0
ggplot(heightweight, aes(x = ageYear, y = heightIn, colour = weightLb)) +
geom_point()Adding Fitted Lines
You want to add lines from a fitted regression model to a scatter plot.
# We'll use the heightweight data set and create a base plot called `hw_sp` (for heighweight scatter plot)
hw_sp <- ggplot(heightweight, aes(x = ageYear, y = heightIn))
hw_sp +
geom_point() +
stat_smooth(method = lm, se = FALSE)`geom_smooth()` using formula = 'y ~ x'
# 99% confidence region
hw_sp +
geom_point() +
stat_smooth(method = lm, level = 0.95)`geom_smooth()` using formula = 'y ~ x'
Customizing labels and title
hw_sp +
geom_point() +
stat_smooth(method = lm, se = FALSE) +
labs(x = "Age",
y = "Height",
title = "Age vs Height") +
theme_bw()`geom_smooth()` using formula = 'y ~ x'
2.2 Line Graph
Basic Line Graph
ggplot(BOD, aes(x = Time, y = demand)) +
geom_line()Adding Points to a Line Graph
ggplot(BOD, aes(x = Time, y = demand)) +
geom_line() +
geom_point()Changing the Appearance of Lines
ggplot(BOD, aes(x = Time, y = demand)) +
geom_line(linetype = "dashed", size = 1, colour = "blue")Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
Making a Line Graph with Multiple Lines
In addition to the variables mapped to the x- and y-axes, map another (discrete) variable to colour or linetype
tg supp dose length
1 OJ 0.5 13.23
2 OJ 1.0 22.70
3 OJ 2.0 26.06
4 VC 0.5 7.98
5 VC 1.0 16.77
6 VC 2.0 26.14
ggplot(tg, aes(x = dose, y = length, colour = supp)) +
geom_line()2.3 Bar Graphs
Bar graphs are perhaps the most commonly used kind of data visualization. They’re typically used to display numeric values (on the y-axis), for different categories (on the x-axis).
Basic Bar Graph
pg_mean group weight
1 ctrl 5.032
2 trt1 4.661
3 trt2 5.526
ggplot(pg_mean, aes(x = group, y = weight)) +
geom_col()Add colors to the bar
ggplot(pg_mean, aes(x = group, y = weight)) +
geom_col(fill = "purple", colour = "black")Adjusting Bar Width and Spacing
Narrower Bar
ggplot(pg_mean, aes(x = group, y = weight)) +
geom_col(width = 0.5)wider width
ggplot(pg_mean, aes(x = group, y = weight)) +
geom_col(width = 0.8)Grouping Bars Together
In this example we’ll use the cabbage_exp data set, which has two categorical variables, Cultivar and Date, and one continuous variable, Weight:
cabbage_exp Cultivar Date Weight sd n se
1 c39 d16 3.18 0.9566144 10 0.30250803
2 c39 d20 2.80 0.2788867 10 0.08819171
3 c39 d21 2.74 0.9834181 10 0.31098410
4 c52 d16 2.26 0.4452215 10 0.14079141
5 c52 d20 3.11 0.7908505 10 0.25008887
6 c52 d21 1.47 0.2110819 10 0.06674995
ggplot(cabbage_exp, aes(x = Date, y = Weight, fill = Cultivar)) +
geom_col(position = "dodge")Making a Stacked Bar Graph
ggplot(cabbage_exp, aes(x = Date, y = Weight, fill = Cultivar)) +
geom_col()Making a Proportional Stacked Bar Graph
ggplot(cabbage_exp, aes(x = Date, y = Weight, fill = Cultivar)) +
geom_col(position = "fill")Adding Labels to a Bar Graph
ggplot(cabbage_exp, aes(x = Date, y = Weight, fill = Cultivar)) +
geom_col(position = "dodge") +
geom_text(
aes(label = Weight),
colour = "black", size = 3,
vjust = 1.5, position = position_dodge(0.9)
) +
labs(x = "Date",
y = "Weight",
title = "Grouped Bars with Labels") 2.4 Summarized Data Distributions
Histogram
library(MASS)
Attaching package: 'MASS'
The following object is masked from 'package:dplyr':
select
ggplot(birthwt, aes(x = bwt)) +
geom_histogram(fill = "purple", colour = "black")`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Making Multiple Histograms from Grouped Data
birthwt_mod <- birthwt
# Convert smoke to a factor and reassign new names
birthwt_mod$smoke <- recode_factor(birthwt_mod$smoke, '0' = 'No Smoke', '1' = 'Smoke')
ggplot(birthwt_mod, aes(x = bwt)) +
geom_histogram(fill = "purple", colour = "black") +
facet_grid(smoke ~ .)`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Making a Basic Box Plot
ggplot(birthwt, aes(x = factor(race), y = bwt)) +
geom_boxplot()Making a Density Plot of Two-Dimensional Data
# Save a base plot object
faithful_p <- ggplot(faithful, aes(x = eruptions, y = waiting))
faithful_p +
geom_point() +
stat_density2d()# Contour lines, with "height" mapped to color
faithful_p +
stat_density2d(aes(colour = ..level..))Warning: The dot-dot notation (`..level..`) was deprecated in ggplot2 3.4.0.
ℹ Please use `after_stat(level)` instead.
# Map density estimate to fill color
faithful_p +
stat_density2d(aes(fill = ..density..), geom = "raster", contour = FALSE)2.5 Visualization and Relationship
view(mpg)A data frame with 234 rows and 11 variables:
- manufacturer
-
manufacturer name
- model
-
model name
- displ
-
engine displacement, in litres
- year
-
year of manufacture
- cyl
-
number of cylinders
- trans
-
type of transmission
- drv
-
the type of drive train, where f = front-wheel drive, r = rear wheel drive, 4 = 4wd
- cty
-
city miles per gallon
- hwy
-
highway miles per gallon
- fl
-
fuel type
- class
-
“type” of car
Does engine size had relationship with efficiency?
mpg %>%
ggplot(aes(displ,cty)) +
geom_point()View the plot according to some category
mpg %>%
ggplot(aes(displ,cty)) +
geom_point(aes(colour = drv))add trend line to the plot
mpg %>%
ggplot(aes(displ,cty)) +
geom_point(aes(colour = trans)) +
geom_smooth()`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Linearize the relationship
mpg %>%
ggplot(aes(displ,cty)) +
geom_point(aes(colour = trans)) +
geom_smooth(method = lm)`geom_smooth()` using formula = 'y ~ x'
mpg %>%
ggplot(aes(displ,cty)) +
geom_point(aes(colour = trans)) +
geom_smooth(method = lm) +
facet_wrap(~drv)`geom_smooth()` using formula = 'y ~ x'
Customizing labels and title
mpg %>%
ggplot(aes(displ,cty)) +
geom_point(aes(colour = trans)) +
geom_smooth(method = lm) +
facet_wrap(~drv) +
labs(x = "engine size" ,
y = "City per Gallon",
title = "Fuel Efficiency") +
theme_bw()`geom_smooth()` using formula = 'y ~ x'
Saving your plot
mpg %>%
ggplot(aes(displ,cty)) +
geom_point(aes(colour = trans)) +
geom_smooth(method = lm) +
facet_wrap(~drv) +
labs(x = "engine size" ,
y = "City per Gallon",
title = "Fuel Efficiency") +
theme_bw() `geom_smooth()` using formula = 'y ~ x'
ggsave('mpg.pdf')Saving 7 x 5 in image
`geom_smooth()` using formula = 'y ~ x'
Other task : Animated chart
# Charge libraries
library(gganimate)Warning: package 'gganimate' was built under R version 4.4.1
p <- ggplot(mpg, aes(displ, cty,colour = trans)) +
geom_point() +
scale_x_log10() +
theme_bw() +
labs(title = 'Year: {frame_time}', x = 'Engine Size', y = 'City Miles per Gallon') +
transition_time(year) +
ease_aes('linear')
# Render the animation and save it as a GIF
anim <- animate(p)
anim_save("gif_chart.gif", animation = anim)3.0 Task
Get data o your interest.
Make 5 or 6 plots to tell us about your data
Customize your data - make change to colour, title, axes etc
Further Reading
Different types of geom https://ggplot2.tidyverse.org/reference/index.html
Customizing your plots: Making changes to plot, add title, change themes etc http://www.cookbook-r.com/Graphs/
more advanced R graphs https://www.r-graph-gallery.com/